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Abstract — This paper presents a simple and efficient method 
to convolve an image with a Gaussian kernel. The computation 
is performed in a constant number of operations per pixel 
using running sums along the image rows and columns. We 
investigate the error function used for kernel approximation 
and its relation to the properties of the input signal. Based on 
natural image statistics we propose a quadratic form kernel error 
function so that the output image fo error is minimized. We apply 
the proposed approach to approximate the Gaussian kernel by 
linear combination of constant functions. This results in very 
efficient Gaussian filtering method. Our experiments show that 
the proposed technique is faster than state of the art methods 
while preserving a similar accuracy. 

Index Terms — Non uniform filtering, Gaussian kernel, integral 
images, natural image statistics. 



I. Introduction 

IMAGE filtering is an ubiquitous image processing tool, 
which requires fast and efficient computation. When the 
kernel size increases, direct computation of the kernel response 
requires more operations and the process becomes slow. 

Various methods have been suggested for fast convolution 
with specific kernels in linear time (see related work in 
Section [II]). An important kernel is the Gaussian, which is 
used in many applications. 

In this paper we present an efficient filtering algorithm 
for separable non uniform kernels and apply it for very fast 
and accurate Gaussian filtering. Our method is based on one 
dimensional running sums {integral images) along single rows 
and columns of the image. The proposed algorithm is very 
simple and can be written in a few lines of code. Complexity 
analysis, as well as experimental results show that it is faster 
than state of the art methods for Gaussian convolution while 
preserving similar approximation accuracy. 

An additional contribution of this paper is an analysis 
of the relation between the kernel approximation and the 
approximation of the final output, the filtered image. Usually, 
the approximation quality is measured in terms of the dif- 
ference between the kernel and its approximation. However, 
minimizing the kernel error does not necessarily minimize the 
error on the resulting image. To minimize this error, natural 
image statistics should be considered. 



In Section IV we investigate the kernel approximation error 
function. Based on natural image statistics we find a quadratic 
form kernel error measurement which minimizes the 1% error 
on the output pixel values. 

The next section reviews related methods for fast non uni- 



form image filtering. Section III presents the filtering algorithm 
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and discusses its computational aspects. Section IV discusses 
the relation between kernel approximation and natural image 
statistics. Section |V| pr esents our experimental results. We 
conclude in Sectionlvll 

II. Related Work 

In the following we review the main approaches to accel- 
erate image filtering. We describe in more detail the integral 
image based approach on which the current work is based. 

General and Multiscale Approaches. Convolving an im- 
age of N pixels with an arbitrary kernel of size K can be 
computed directly in O(NK) operations, or using the Fast 
Fourier Transform in 0(N log K) operations 0. 

A more efficient approach is the linear time multiscale 
computation using image pyramids 0, 0. The coarser image 
levels are filtered with small kernels and the results are inter- 
polated into the finer levels. This approximates the convolution 
of the image with a large Gaussian kernel. 

Recursive Filtering. The recursive method is a very ef- 
ficient filtering scheme for one dimensional (or separable) 
kernels. The infinite impulse response (IIR) of the desired 
kernel is expressed as a ratio between two polynomials in 
Z space 0. Then the convolution with a given signal is 
computed by difference equations. Recursive algorithms were 
proposed for approximate filtering with Gaussian kernels |5], 
0, 0, 0, 0, OH, anisotropic Gaussians |11| and Gabor 
kernels (Y2\ . 

The methods of Deriche 0, 0, and Young and van 
Vliet 0, are the current state of the art for fast approxi- 
mate Gaussian filtering. Both methods perform two passes in 
opposite directions, in order to consider the kernel response 
both of the forward and backward neighbors. The method of 
Young and van Vliet 's requires less arithmetic operations per 
pixel. However, unlike Deriche's method the two passes cannot 
be parallelized. 

Tan et al. |[T3ll evaluated the performance of both methods 
for small standard deviations using normalized RMS error. 
While Deriche's impulse response is more accurate, Young 
and van Vliet performed slightly better on a random noise 
image. No natural images were examined. Section |V| provides 
further evaluation of these methods. 

Integral Image Based Methods. Incremental methods such 
as box filtering [14] and summed area tables, known also as 
integral images lfl5l , (T6), cumulate the sum of pixel values 
along the image rows and columns. In this way the sum of 
a rectangular region can be computed using 0(1) operations 
independent of its size. This makes it possible to convolve an 
image very fast with uniform kernels. 



Heckbert ifTTIl generalized integral images for polynomial 
kernels of degree d using d repeated integrations. Derpanis et 
al. [Q~8 ] applied this scheme for a polynomial approximation 
of the Gaussian kernel. Werman lfT9l introduced another 
generalization for kernels which satisfy a linear homogeneous 
equation (LHE). This scheme requires d repeated integrations, 
where d is the LHE order. 

Hussein et al. |20| proposed Kernel Integral Images (KII) 
for non uniform filtering. The required kernel is expressed 
as a linear combination of simple functions. The convolu- 
tion with each such functions is computed separately using 
integral images. To demonstrate their approach, the authors 
approximated the Gaussian kernel by a linear combination of 
polynomial functions based on the Euler expansion. Similar 
filtering schemes were suggested by Marimon [21 ] who used 
a combination of linear functions to form pyramid shaped ker- 
nels and by Elboher and Werman [ 22 ] who used a combination 
of cosine functions to approximate the Gaussian and Gabor 
kernels and the bilateral filter (23). 

Stacked Integral Images. The most relevant method to this 
work is the Stacked Integral Images (SII) proposed by Bhatia 
et al. |[24l . The authors approximate non uniform kernels by a 
'stack' of box filters, i.e. constant 2D rectangles, which are all 
computed from a single integral image. The simplicity of the 
used function and not using multiple integral images makes 
the filtering very efficient. 

The authors of SII demonstrated their method for Gaus- 
sian smoothing. However, they approximated only specific 
2D kernels, and found for each of them a local minima of 
a non-convex optimization problem. Although the resulting 
approximations can be rescaled, they are not very accurate (see 
Section [V]). Moreover, the SII framework does not exploit the 
separability of the Gaussian kernel. 

As shown in this paper, utilizing the separability property 
we find an optimal kernel approximation which can be scaled 
to any standard deviation. As shown in Figure [2| this ap- 
proximation is richer and more accurate. Actually, separable 
filtering of the row and the columns by k one dimensional 
constants is equivalent to filtering by 2 k — 1 two dimensional 
boxes. The separability property can also be used for an 
efficient computation both in time and space, as described in 
Section [Till 

III. Algorithm 

A. Piecewise Constant Kernel Decomposition 

Consider the convolution f*K of a function / with a kernel 
K. For simplicity we first discuss the case in which / and 



K are one dimensional. In the following (Section III-D) we 
generalize the discussion to higher dimensions. 

Suppose that the support of the kernel K is r, i.e. that K is 
zero outside of [0, r]. Assume also that we are given a partition 
P = (Po,Pi, ...Pfc), in which < p < P\ < V2 ••• < Pk-i < 
Pk < r. Thus, the kernel K can be approximated by a linear 
combination of k simple functions Ki, i = l...k: 
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Fig. 1. Approximation of the ID Gaussian kernel on [— 7r,7r] by k = 4 
piecewise constant functions. The dashed lines show the equivalence between 
4 constant 'slices' and 7 constant 'segments' (Equation [4). 




(c) Proposed approximation (4 con- (d) Proposed approximation (5 con- 
stants), stants). 

Fig. 2. Comparison of (a) exact Gaussian kernel, (b) Stacked Integral 
Images |24 1 with 5 2D boxes, and the proposed method with 4 constants (c) 
and 5 constants (d). Our proposed approximation is richer and more accurate 
since it utilizes the Gaussian separability. Instead of using 2D boxes, we use 
ID segments to filter the rows and then the columns. 



Using the above approximation 



f*K^^f*K t 



(2) 



2=1 



Ki{t) 







tfPi-i <t <Pi 
otherwise 



(1) 



which can be computed very efficiently, as described in 
Section UTTCl 

B. Symmetric Kernels 

Consider the case in which K is a symmetric kernel on 
[—r,r]. The approximation can be limited to the range [0,r]. 



Method 


Additions 


Multiplications 


Direct 


h + w - 2 


h -\- w 


FFT 


0(log(hw)) 


0(log(hw)) 


KII |20| 


53 


18 


CII |22| 


12&-8 


8fc-8 


SII |24| 


4fc + l 


fc 


Deriche |7| 


8k -2 


8k 


Young and 
van Vliet QD 


4fc 


4k + 4 


Proposed method 


4fc 


2fc 



TABLE I 
Arithmetic operations per image pixel (Section |IH-D| >. 



Each constant Ci can be used both in the negative interval 
[~Pij —Pi-i] and in the positive one [p^_i,p^, however, this 
requires 2fc constant function. Actually, the same approxima- 
tion can be computed using 'weighted slices': 



Si(t) 



-{ 



Wi if -pi <t < pi 
otherwise 



(3) 



where Wi — ci — q_i. Now the kernel K is approximated 
by sum of k constant functions which are non zero in the 
overlapping intervals [— Pi,Pi] 9 with weights Wi = C{ — q_i. 
The approximation of K remains the same: 



i=l i:—pi<t<pi 



J2 (Ci-Ci-i) 

i:—pi<t<pi 



= J2 * = !>*(*) 

i:pi-i<\t\<pi i=l 

This equality is illustrated in Figure [T] 



(4) 



C. 7£> Filtering Algorithm 

In the following we describe the algorithm for the case of 
a symmetric kernel K (Section |III-B| ). 

Given a ID discrete function f{x) and k piecewise con- 
stant functions Si, we compute the approximated convolution 
(Equation [2} as follows: 

1) Compute the cumulative sum of f(x), 



I(x) 



E 

x'=0 



fix') 



(5) 



2) Compute the convolution result for each pixel x, 



k 

E 



Wi(I(x +Pi) - I(x - Pi - 1)) 



(6) 



The total cost of steps 1 and 2 is 2k additions and k 
multiplications per image pixel. In the ID case memory access 
is not an issue, as all the elements are usually in the cache. 



D. Higher Dimensions and More Computational Aspects 

We now describe the case of convolving a 2D image f(x,y) 
with a separable 2D kernel K. A kernel K is separable if it 
can be expressed as a convolution of ID filters K x * Ky . The 
convolution / * K can be computed by first convolving the 
image rows with K x and then the columns of the intermediate 
result by K y . Hence, filtering an image with a separable 2D 
kernel only doubles the computational cost of the ID case. 

The space complexity is also very low. Since the convolution 
of K x with each row (and also K y with each column) is 
independent, filtering an image of size n x m requires only 
0(max(n, m)) additional space over the input and the output 
images for storing the cumulative sum of a single row or 
column. 

Similarly to the 2D case, the filtering scheme can be ex- 
tended to d-dimensional separable kernels using 2dk additions 
and dk multiplications per single pixel. The additional required 
space is 0(max i (n i )), where rii (i = l...d) are the sizes of 
the signal dimensions. 

Table [I] counts the required operations per image pixel for 
2D image filtering by a hxw separable kernel. The parameter 
k denotes the number of terms (constants, 2D boxes etc.) 
depending on the specific method. Notice that experimentally 
our proposed method with 3 constants is more accurate than 
SII [24) with 5 boxes and has an accuracy similar to the 
recursive methods with 3 coefficients (Section |VJ Figure 3(b) ). 
This means that our proposed method requires less operations 
than Deriche |7] and all the integral image based methods, 
and a comparable number of operations to Young and van 
Vliet 00. 

The proposed method is also convenient for parallel com- 
putation. The summation step (Equation [5]) can be parallelized 
as described in Section 4.3 of |25], while the next step (Equa- 
tion [6]) computes the response of each pixel independently of 
its neighbors. On the other hand, the recursive methods can 
be parallelized only partially - e.g. by filtering different rows 
independently. However, within each row (or column) all the 
computations are strongly dependent. 

Notice also that Equation [6] can be computed for a small 
percentage of the image pixels. This can accelerate applica- 
tions in which the kernel response is computed for sampled 
windows such as image downscaling. The recursive methods 
do not have this advantage, since they compute the kernel 
response of each pixel using the responses of its neighbors. 

IV. Kernel Approximation 

The approximation of a one dimensional kernel K(t) by 
k constant functions is determined by the partition P and 
the constants Q. Finding the best parameters is done by 
minimizing an approximation error function on the desired 
kernel. Indeed, our real purpose is to minimize the error 
on the output image, which is not necessarily equivalent. 



Section |IV-A| relates the kernel approximation error and the 
output image error using natural image statistics. We define a 
quadratic form error function for the kernel approximation, so 
that the output image l<z error is minimized. This error function 



is used in Section IV-B to approximate the Gaussian kernel. 



A. Minimal Output l 2 Error 

We denote the input signal by x, the output signal by y, and 
the kernel weights by w. The approximated output and kernel 
weights are denoted by y and w respectively. The upper case 
letter X denotes the Fourier transforms of the input x. 

The squared l 2 error of the final result is given by 



e 2 = £>< - mf 



(7) 



Since yi = V ■ WjXi+j, E2 can be expressed in terms of the 
input signal x and the kernels w, w: 



E 2 = J2 (Yl W J Xi +i " J2 ™3 x i+j ) 

i 3 3 



(8) 



i,3,k 

E 

3,k 



((wj - Wj)(w k - w k )^2x i+j x i+k 



The only term which involves the input signal x is the inner 
sum, which we denote as Aj k = J^i x i+j x i+k- 

Note that Aj k is the the autocorrelation of x at location 
j—k. The error can be therefore expressed as 

E 2 = (w- w) T A(w - w) (9) 

where A's entries are given by the autocorrelation of the signal 
x. 

In order to make E 2 independent of the values of a specific 
x, we make use of a fundamental property of natural images 
introduced by Field |26] the Fourier spectrum of a natural 
image in each frequency u is proportional to ^, 



\XJ oc 



1 



(10) 



Applying the Convolution Theorem, the autocorrelation of x 
is 



\X U 



oc 



(11) 



Hence, Aj k should be proportional to <£, the inverse Fourier 
transform of \. 

Of course, the definition of <£ is problematic since -^ is 
not defined for u = 0. Completing the missing value by 
an arbitrary number affects all <E> values. However, additional 
information is available which helps to complete the missing 
value. 

Assuming that the values of x are uniformly distributed in 
[0, 1], we find that <£ (which equals to Ajj, the correlation of a 
pixel with itself) is proportional to J Q x 2 dx = |. In addition, 
assuming that distant pixels are uncorrelated, the boundary 
values <£ r ,<£_ r are proportional to /i^ = \. This means that 
the ratio -^- should be | . Determining the missing zero value 
in the Fourier domain as 16.5 results in such a function <l>. 



As shown by our experiments (Section |VJ Figure [3(b)] ), ap- 
proximating the Gaussian kernel using the proposed quadratic 



form (Equation [9]) where Aj k = <&j- k results in a very low 
l 2 error (or high PSNR) on the output image. Modifying <£ 
values slightly decreases the accuracy. 

B. Gaussian Approximation 

The Gaussian kernel is zero mean and its only parameter 
is standard deviation a. Actually, all the Gaussian kernels are 
normalized and scaled versions of the standard kernel exp(y). 
Therefore the approximation need to be computed only once 
for some a . For other a values the pre-computed solution of 
N(t) is simply rescaled. 

We approximate the Gaussian within the range [— 7rcr, ira], 
since for greater distances from the origin kernel values are 
negligible. The optimization is done on the positive part 
[0,7nr]. Then the kernel symmetry is used to getcompute the 



weights Wi as described in Section III-B 



The partition indices pi and constants C{ were found by an 
exhaustive search using 100 samples of the Gaussian kernel 
with a = ^. To get the parameters for other a values, we 
scale the indices and round them: 



» 



The weights are also scaled, 



» 



•Pi 



Pz 



2p 



to 



(12) 



(13) 



The parameters for different numbers of constants are pre- 
sented in Table [TTJ Figure [T] shows the optimal approximation 
with 4 constant functions. Figure |2jc,d) present the 2D result 
of filtering image rows and columns using the proposed 
approximation. 



#constants (k) 


partition indices (pi) 


weights (a) 


k = 3 


23, 46, 76 


0.9495, 0.5502, 0.1618 


k = A 


19, 37, 56, 82 


0.9649, 0.6700, 0.3376, 0.0976 


k = 5 


16, 30, 44, 61, 85 


0.9738, 0.7596, 0.5031, 
0.2534, 0.0739 



TABLE II 
Algorithm parameters. 



V. Experimental Results 



In order to evaluate the performance Gaussian filtering 
methods we made an experiment on natural images from 
different scenes. We used a collection of 20 high resolution 
images (at least 1 megapixel) taken from Flickr under creative 
commons. Each of the image was filtered with several standard 
deviation (a) values. Speed and accuracy are measured in 
comparison to the exact Gaussian filtering performed by the 
'cvSmooth' function of the OpenCV library (27). The output 
image error is measured by the peak signal-to-noise ratio 
(PSNR). This score is defined as -10log 10 (MSE), MSE 
is the mean squared error and the image values are in [0,1]. 

We examined our proposed method as well as the state of 
the art methods of Deriche |7] and Young and van Vliet 0. 
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Fig. 3. Experimental results (Section [VJ. (a) Time speedup is in comparison to exact filtering. Our proposed method with k = 3 is the fastest. Using 
k = 5 is similar to Young and van Vliet, while Deriche and SII are slower, (b) Accuracy of the output image: the highest PSNR (minimal 1 2) is achieved by 
our proposed quadratic form approximation with k = 4 or k = 5 which are slightly better than Deriche. Using k = 3 is still better than all other methods 
including Young and van Vliet. Approximating the Gaussian kernel by I2 instead of the proposed quadratic form decreases the approximation accuracy. Kernel 
Integral Images (KII) gives poor results for small a values since integral images of polynomials are numerically unstable for high ratio between a and the 
image size. 



We also examined the integral image based methods of Hus- 
sein et al. (KII, 1201 ). Elboher and Werman (CII, El) and 
Bhatia et al. (SII, l24ll ). which use a similar approach to the 
proposed method (Section [II]). All the compared methods were 
implemented by uqj except Young and van Vliet's filtering, 
for which we adopted the code of |11). This implementation 
is based on the updated design of Young and van Vliet in lfT2l 
with corrected boundary conditions [28]. 

In order to examine the effect of using different error func- 

1 The CImg library contains an implemetation of Deriche' s method for 
k = 2 coefficients. Due to caching considerations, our implementation is 
about twice as fast. For the integral image based methods there is no public 
implementation except CII |22| which was proposed by the authors of the 
current paper. 



tions for kernel approximation, we tested our proposed method 
with two sets of parameters. The first set was computed 
by minimizing the quadratic form error function defined in 



Section IV-A The second set was computed by minimizing 
the I2 error. Since the filtering technique is identical the speed 
is the same, the difference is in the output accuracy. 

Figure [3] presents the average results of our experiments. 
The highest acceleration is achieved by our proposed method 
with k = 3 constants (Figure 3(a) ). Using k = 4 is still faster 
than all other methods, while k = 5 has equivalent speedup 
as Young and van Vliet's method. 

The best accuracy is achieved by our quadratic form approx- 
imation with k = 4, 5, which is slightly better than Deriche 's 



method (Figure 3(b)). Using I2 kernel approximation decreases 



the quality of our proposed method. This demonstrates the 
importance of using an appropriate kernel approximation. 
However, these results are still similar to Young and van Vliet 
and better than other integral image based methods. 

VI. Conclusion 

We present a very efficient and simple scheme for filtering 
with separable non uniform kernels. In addition we analyze 
the relation between kernel approximation, output error and 
natural image statistics. Computing an appropriate Gaussian 
approximation by the proposed filtering scheme is faster than 
the current state of art methods, while preserving similar 
accuracy. 
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